77 research outputs found

    Query-Based Keyphrase Extraction from Long Documents

    Full text link
    Transformer-based architectures in natural language processing force input size limits that can be problematic when long documents need to be processed. This paper overcomes this issue for keyphrase extraction by chunking the long documents while keeping a global context as a query defining the topic for which relevant keyphrases should be extracted. The developed system employs a pre-trained BERT model and adapts it to estimate the probability that a given text span forms a keyphrase. We experimented using various context sizes on two popular datasets, Inspec and SemEval, and a large novel dataset. The presented results show that a shorter context with a query overcomes a longer one without the query on long documents

    Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

    Full text link
    We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities. In this way, the individual contributions of evidences towards the final predicted probability can be identified. In per-evidence relevance probability, our model can further distinguish whether each relevant evidence is supporting (S) or refuting (R) the claim. This allows to quantify how much the S/R probability contributes to the final verdict or to detect disagreeing evidence. Despite its interpretable nature, our system achieves results competitive with state-of-the-art on the FEVER dataset, as compared to typical two-stage system pipelines, while using significantly fewer parameters. It also sets new state-of-the-art on FAVIQ and RealFC datasets. Furthermore, our analysis shows that our model can learn fine-grained relevance cues while using coarse-grained supervision, and we demonstrate it in 2 ways. (i) We show that our model can achieve competitive sentence recall while using only paragraph-level relevance supervision. (ii) Traversing towards the finest granularity of relevance, we show that our model is capable of identifying relevance at the token level. To do this, we present a new benchmark TLR-FEVER focusing on token-level interpretability -- humans annotate tokens in relevant evidences they considered essential when making their judgment. Then we measure how similar are these annotations to the tokens our model is focusing on.Comment: updated acknowledgemen

    Populations of Stored Product Mite Tyrophagus putrescentiae Differ in Their Bacterial Communities

    Get PDF
    Citation: Erban, T., Klimov, P. B., Smrz, J., Phillips, T. W., Nesvorna, M., Kopecky, J., & Hubert, J. (2016). Populations of Stored Product Mite Tyrophagus putrescentiae Differ in Their Bacterial Communities. Frontiers in Microbiology, 7, 19. doi:10.3389/fmich.2015.01046Background: Tyrophagus putrescentiae colonizes different human-related habitats and feeds on various post harvest foods. The microbiota acquired by these mites can influence the nutritional plasticity in different populations. We compared the bacterial communities of five populations of T putrescentiae and one mixed population of T putrescentiae and T fanetzhangorum collected from different habitats. Material: The bacterial communities of the six mite populations from different habitats and diets were compared by Sanger sequencing of cloned 16S rRNA obtained from amplification with universal eubacterial primers and using bacterial taxon-specific primers on the samples of adults/juveniles or eggs. Microscopic techniques were used to localize bacteria in food boli and mite bodies. The morphological determination of the mite populations was confirmed by analyses of CO1 and ITS fragment genes. Results: The following symbiotic bacteria were found in compared mite populations: Wolbachia (two populations), Cardiniurn (five populations), Bartonella-like (five populations), Blattabacteriurn-like symbiont (three populations), and Solitalea-like (six populations). From 35 identified OTUs97, only Solitalea was identified in all populations. The next most frequent and abundant sequences were Bacillus, Moraxella, Staphylococcus, Kocuria, and Microbacteriurn. We suggest that some bacterial species may occasionally be ingested with food. The bacteriocytes were observed in some individuals in all mite populations. Bacteria were not visualized in food boli by staining, but bacteria were found by histological means in ovaria of Wolbachia infested populations. Conclusion: The presence of Blattabacterium-like, Cardinium, Wolbachia, and Solitalea like in the eggs of T putrescentiae indicates mother to offspring (vertical) transmission. Results of this study indicate that diet and habitats influence not only the ingested bacteria but also the symbiotic bacteria of T putrescentiae
    • …
    corecore